HomeDevelopment SupportSearchHome

ARM Cores: Frequently Asked Questions

Last updated 29th June 2000

This page contains some Frequently Asked Questions about the ARM Cores.

To find information on a particular subject, simply search within this page using your browser (e.g. Edit->Find in page). Alternatively use the search button at the top of this page to search the whole of this ARM web site - this will also find other references to your chosen subject in (for example) Data Sheets and Application Notes located elsewhere on this site.

1. Reading or storing data and instructions:

2. Testing the ARM cores:

3. Initialisation and Operation of the ARM core:

4. Corrections to ARM7TDMI data sheet (ARM DDI 0029E):

5. Interrupt behaviour:

6. Interfacing:

7. Synthesisable ARM cores:

8. Cached ARM cores:


1. Reading or storing data and instructions:

What does the ARM core read / write when using non aligned addresses?

When an ARM instruction fetch takes place, the two least significant bits in the Address are undefined. Therefore an instruction fetch should always be considered to be aligned by the memory controller for the instruction to be fetched as expected, i.e. the memory system should ignore A[1:0] for an ARM instruction fetch. For Thumb instruction fetches, only A[0] should be ignored, because A[1] is now needed to indicate the half-word address.

Unaligned accesses can only take place for data loads or stores. In this case, address bits [1:0] indicate which byte is addressed.

Note: In general, the use of unaligned addresses should be avoided. Instead, the appropriate instructions should be used to load/store words (LDR/STR), half-words (LDRH/STRH) or bytes (LDRB/STRB). The behaviour of unaligned data loads and stores depends on the implementation ('unpredictable').

The following is a description of how the ARM7TDMI and ARM9TDMI cores (and the processors which include these cores) behave, but other cores may behave differently. Descriptions on how the ARM7 devices behave with unaligned data loads is described in the ARM7TDMI Data Sheet. For a general description please refer to the ARM Architecture Reference Manual.

Example:

Data Word AA BB CC DD Endian Configuration
Bit:
31 0
 
Byte Address: 3 2 1 0 Little Endian
0 1 2 3 Big Endian

The following example shows the actions for a Little Endian system when the 2 least significant bits of the address are [10], i.e. not word aligned:

For unaligned data stores:
a) Word store ( STR ): presented to the data bus: AA BB CC DD
A word store should generate a word aligned address. The word presented to the data bus is not modified even if the address is not word aligned.
The memory controller should ignore the two least significant bits of the address.

b) Half word store ( STRH ): presented to the data bus: CC DD CC DD
Register data bits [15:0] are duplicated across the data bus.
The memory controller should ignore the least significant bit of the address.

c) Byte store ( STRB ): presented to the data bus: DD DD DD DD
Register data bits [7:0] are duplicated on all four byte lanes of the data bus.
The memory controller needs all bits (incl. the two least significant bits) of the address.

For unaligned data read:
a) Word read ( LDR ): read into register: CC DD AA BB
The whole word is read, but in the ARM core the bytes are rotated such that the byte which is addressed is stored on [7:0].
The memory controller should ignore the two least significant bits of the address.

b) half word read ( LDRH ): read into register: 00 00 AA BB
The selected half word is placed on the bottom [15:0] bits in the register and the remaining bits are filled with zeros by the core.
The memory controller should ignore the least significant bit of the address.

c) byte read ( LDRB ): read into register: 00 00 00 BB
The selected byte is placed on bits [7:0] in the destination register and the remaining bits of the register are filled with zeros by the core.
The memory controller needs all bits (incl. the two least significant bits) of the address.

In general:
The result of all half word loads or stores (issued as ARM or Thumb instructions) with a non-halfword aligned address will be unpredictable.

Important:
The memory controller must be able to handle word, half word and byte reads and writes!

If a memory controller is being designed to abort unaligned data transfer, it is essential that the signal nOPC is used to prevent instruction fetches from being aborted.

For further information, see the ARM7TDMI Data Sheet (ARM DDI 0029E), sections 4.9 and 4.10.

There are also some Application Notes which address related topics:

4. Programmer's Model for Big-Endian ARM
37. Startup configuration of ARM processors with MMUs
61. Big and Little Endian Byte Addressing


How does Little / Big Endian mode affect aligned / unaligned addressing?

The Endian configuration of your system has no effect if you always read / write words. It is only important when half words or bytes are read / stored.

The following table can be found in the ARM9TDMI Data Sheet (rev 1), section 3.6, and applies both to the ARM9TDMI and the ARM7TDMI (and processors containing these cores). It shows which bits of the data bus are read into the least significant bits of the destination register.

Table 1: Endian effects for 16-bit data fetches (LDRH)
A[1:0] Little Endian (BIGEND=0) Big Endian (BIGEND=1)
00 D[15:0] D[31:16]
10 D[31:16] D[15:0]

Table 2: Endian effects for 8-bit data fetches (LDRB)
A[1:0] Little Endian (BIGEND=0) Big Endian (BIGEND=1)
00 D[7:0] D[31:24]
01 D[15:8] D[23:16]
10 D[23:16] D[15:8]
11 D[31:24] D[7:0]

Example:

Data Word AA BB CC DD Endian Configuration
Bit:
31 0
 
Byte Address: 3 2 1 0 Little Endian
0 1 2 3 Big Endian

If last 2 address bits are [00] (aligned transfer):
word read: Little Endian AA BB CC DD
  Big Endian AA BB CC DD
half word read: Little Endian: 00 00 CC DD
  Big Endian: 00 00 AA BB
byte read: Little Endian: 00 00 00 DD
  Big Endian: 00 00 00 AA

If the last 2 address bits are [10] (unaligned transfer)
word read: Little Endian CC DD AA BB
  Big Endian CC DD AA BB
half word read: Little Endian 00 00 AA BB
  Big Endian 00 00 CC DD
byte read: Little Endian 00 00 00 BB
  Big Endian 00 00 00 CC

For further information, see the ARM7TDMI Data Sheet (ARM DDI 0029E), section 4.10.4

There are also some Application Notes which address related topics:

4. Programmer's Model for Big-Endian ARM
37. Startup configuration of ARM processors with MMUs
61. Big and Little Endian Byte Addressing


How does the memory controller know whether the current access is aligned / non aligned word/ half word /byte?

MAS[1:0] is used to indicate whether a word / half-word or byte access is to be performed, and is described on page 9-5 of the ARM7TDMI Data Sheet (ARM DDI 0029E). The signal has the following states:

MAS[1:0]  
Bit 1 Bit 0 Data size
0 0 byte
0 1 half-word
1 0 word
1 1 (reserved)

Important: The memory system must be able to handle word, half word and byte writes!

Memory systems supporting only word writes will have severe difficulties supporting C code because the compilers assume that the underlying access types of the ARM architecture are always available. Furthermore, it will not be possible to set software breakpoints in Thumb code using the EmbeddedICE Interface.

Instruction fetches:
When in Thumb state, A[0] is not driven, and will be held at whatever level it was last driven to, by a 'sticky latch'. Usually, this will be set following a BX instruction (with bit 0 of the register set), or a data transfer to/from an odd address. It would normally be cleared again following a data transfer to/from an even address.

The memory controller should ignore A[0] for Thumb instruction fetches (nOPC=0 and MAS[1:0]=01), and A[1:0] for ARM instruction fetches (nOPC=0 and MAS[1:0]=10).


Use of the uni-directional data buses in the ARM7TDMI

The signal BUSEN indicates the data bus configuration:

  • When BUSEN is HIGH, the two uni-directional data busses DIN[31:0] and DOUT[31:0] are used for transfer of data between processor and memory, and the bi-directional data bus must be left unconnected. Bus keepers are not necessary and should not be added.

  • When BUSEN is LOW, the bi-directional data bus D[31:0] is used for transfer of data between processor and memory. Any data presented on DIN[31:0] is ignored (but should be tied off), and DOUT[31:0] is driven to the value 0x00000000. Depending on the design of the memory system, bus keepers may be required on D[31:0].


What is the difference between a von Neumann architecture and a Harvard architecture?

A Harvard architecture has separate data and instruction busses, allowing transfers to be performed simultaneously on both busses. A von Neumann architecture has only one bus which is used for both data transfers and instruction fetches, and therefore data transfers and instruction fetches must be scheduled - they can not be performed at the same time.

It is possible, and sometimes done, to have two separate memory systems for a Harvard architecture. As long as data and instructions can be fed in at the same time, then it doesn't matter whether it comes from a cache or memory. But there are problems with this. Compilers generally embed data (literal pools) within the code, and it is often also necessary to be able to write to the instruction memory space, for example in the case of self modifying code, or, if an ARM debugger is used, to set software breakpoints in memory. If there are two completely separate, isolated memory systems, this is not possible. There must be some kind of bridge between the memory systems to allow this.

Using a simple, unified memory system together with a Harvard architecture is highly inefficient. Unless it is possible to feed data into both busses at the same time, it might be better to keep the design simple and stick to von Neumann architecture.

Use of caches

At higher clock speeds, caches are useful as the memory speed is proportionally slower. Harvard architectures tend to be targeted at higher performance systems, and so caches are nearly always used in such systems.

Von Neumann architectures usually have a single unified cache, which stores both instructions and data. The proportion of each in the cache is variable, which may be a good thing. It would in principle be possible to have separate instruction and data caches, storing data and instructions separately. This probably would not be very useful as it would only be possible to ever access one cache at a time. The ARM7xxT processor cores don't have separate instruction and data caches, but it is possible to define separate banks for instructions and data.

Caches for Harvard architectures are very useful. Such a system would have separate caches for each bus. Trying to use a shared cache on a Harvard architecture would be very inefficient since then only one bus can be fed at a time. Having two caches means it is possible to feed both buses simultaneously....exactly what is necessary for a Harvard architecture.

This also allows to have a very simple unified memory system, using the same address space for both instructions and data. This gets around the problem of literal pools and self modifying code. What it does mean, however, is that when starting with empty caches, it is necessary to fetch instructions and data from the single memory system, at the same time. Obviously, two memory accesses are needed therefore before the core has all the data needed. This performance will be no better than a von Neumann architecture. However, as the caches fill up, it is much more likely that the instruction or data value has already been cached, and so only one of the two has to be fetched from memory. The other can be supplied directly from the cache with no additional delay. The best performance is achieved when both instructions and data are supplied by the caches, with no need to access external memory at all.

This is the most sensible compromise and the architecture used by ARMs Harvard processor cores. Two separate memory systems can perform better, but would be difficult to implement.


2. Testing the ARM cores:

How do I drive other ARM7TDMI core input pins while using serialised test vectors via JTAG to test the core?

In general, when using the serial JTAG vectors to test the ARM core, only the signals MCLK, TBE and nRESET need to be driven to special states, as follows:
    MCLK        LOW
    TBE         HIGH for ICE-Tests, depends on system designs for other tests
    nRESET      HIGH
    
Important:
You must also ensure that the external system is isolated from the ARM7TDMI during the serial test to remove the possibility of bi-directional signals clashing on the data bus.

The document Serial Test Procedure explains the serial test procedure in detail.


Can production test vectors be used to determine the maximum core speed of the ARM?

The production test vectors are not designed for 'at speed' testing of the ARM cores. They are designed to give high fault coverage. The ARM cores sample inputs and change outputs on both the rising and falling clock edges. As the clock periods gets shorter, outputs start to change in the next cycle. This problem is made worse on a tester because there is the possibility of having a write followed by a read, causing contention on the data bus.

Therefore it is not possible to use these vectors to carry out speed testing of the ARM cores. However it may be possible to reduce the cycle time and scale the test vectors to a higher frequency (to reduce test time, for example), but this will not be the actual maximum operating frequency of the ARM core.

ARM recommends the core is characterised using ARM's pre-fab characterisation simulations in combination with measurements on a test chip using special characterisation patterns.


Do the test vectors check the TAP controller ID code?

The TAP controller IDCODE is checked only by the parallel scan vectors.

The IDCODE is provided by 32 transistors with their sources tied to either Vdd or Gnd, and can be changed by altering one metal layer in the layout. If the IDCODE is changed by the ASIC designer, the netlist for layout verification as well as the test vectors must be modified to run without errors.

The length of the IDCODE is 32 bits, subdivided into 4 different fields:

31 28
27 12
11 1
0
      Version               Part Number         Manufacturing ID   1

Note that there is no parallel output from the ID register for the ARM7 family. The least significant bit of the register is scanned out first.

For the rev1 ARM7TDMI, the default IDCODE is 0x1F0F0F0F. The ID code for the rev1 ARM9 family processors is configurable by the ASIC designer.
For the IDCODE of other ARM processors or more detailed information please contact us


What is the timing relationship between TDI/SDIN and TDO/SDOUTBS in the ARM processors which include the EmbeddedICE Logic?

TDI is fed into a D-type Flip Flop, clocked by the rising edge of TCK. The output of this Flip Flop is presented on SDIN. If one of the user added scan chains is selected, SDOUTBS is fed asynchronously through from the user added scan chain to TDO.

  • The same applies to the ARM7TDMI, only there SDIN in the above picture is called SDINBS.

  • Data is clocked out of the ARM core and out of each scan chain element on the rising edge of TCK.

  • Data is clocked into a scan chain element at the falling edge of TCK.
For more documentation, see also our Application Note: 28. The ARM7TDMI Debug Architecture


How can the ARM core be tested?

For the ARM7TDMI core, there are essentially 3 different approaches to testing:

1) Conventional parallel test:
ARM supplies 10 sets of parallel test vectors, 27k vectors in total, to test the functionality of the entire core. This test method requires access to all of the I/O signals of the ARM7TDMI core, which sometimes might not be possible or feasible for an ARM core embedded in an ASIC.

2) Using serialised test vectors:
ARM provides 'serialised' test vectors for the ARM7TDMI core to test it through the JTAG port.
This increases the number of vectors to over 900k. The advantage here is that only the 5 JTAG signals need to be accessible to the tester.

The TAP controller in the ARM processor core is a standard IEEE1149.1 compliant implementation. However, the scan cells used in the ARM7 core are not fully compliant, because they do not have an 'update' stage. The scan cells in the ARM9 cores are fully IEEE1149.1 compliant, because they do have an 'update' stage.

3) Using AMBA:
This technique uses ARM's standard on chip bus specifications, AMBA (Advanced Microcontroller Bus Architecture). If the users ASIC design implements AMBA, a 32 bit bus architecture, then it is possible to apply the AMBA test methodology to the core. The parallel test vectors have been partitioned into 32 bit words which can be applied to the core through the 32 bit test interface of the ASIC if the AMBA test methodology has been adopted. The production test patterns for the ARM7TDMI have been converted into AMBA "TIC" (Test Interface Control) patterns which present about 100k of test vectors, with generally only 3 extra pins required.

For ASIC designs using AMBA, this is the recommend approach.


Comparison of production vectors.

Here is a vector count comparison between AMBA (TIC), Parallel, and Serialised (access through JTAG) production test vectors:

ARM7TDMI production vectors comparison:
  Test     TIC     Parallel     Serialised (JTAG) 
 Tale  1225 220 26079
 Tarm  13065 2506 286695
 Tdebug *  18025 3527 65716
 Tice *  65150 13003 55183
 Tmul  13020 2584 286119
 Tpipe  5270 972 115339
 Tregnew  2105 420 68097
 Tthumb  3885 719 85002
 Total  121745 23951 988230

* The Debug and ICE test have a high proportion of pre-serialised sections.


How do I add scan chains to the ARM TAP controller?

If the designer wants to test additional devices on the ASIC, either scan chain 3 (boundary scan) can be used, or additional scan chains can be added:

The ARM7TDMI uses scan chains 0-4, 8 for internal purposes. Additionally, scan chain 15 is used by the system control coprocessor in the ARM710T / ARM720T. Therefore, for the ARM7 family, scan chains 5-7 and 10-14 can be used by the ASIC designer to test additional parts of the system.
For the ARM9 family, scan chains 16 to 31 can be used, while scan chains 0 to 15 are reserved for use by ARM.

Note: The ARM TAP controller is IEEE compliant. It is recommended to follow the IEEE1149.1 specifications when adding scan chains to the TAP Controller.

Latch-based scan cells similar to the following are used:

Latch A has two controlling inputs, SHCLKBS and ECAPCLKBS. It is effectively 2 latches, with a mux which selects whichever clock most recently changed. SHCLKBS and ECAPCLKBS are mutually exclusive signals.

The signals the designer needs to use for the BS cells are:

DRIVEBS (Equivalent to IEEE1149 signal)     DriveOut
ECAPCLKBS CaptureClockBS
ICAPCLKBS CaptureClockBS
nHIGHZ N/A
PCLKBS PClockBS
RSTCLKBS ResetClockBS
SHCLKBS ShiftAClockBS
SHCLK2BS L2ClockBS
SDINBS 'From last cell' to first cell in chain
SDOUTBS 'To next cell' from last cell in chain

If you are adding other scan chains (as opposed to just adding a boundary scan chain on scan-chain 3), you will also need to decode equivalent control signals from:

SCREG[3:0]           number of scan chain currently selected
IR[3:0] TAP controller instruction register
TAPSM[3:0] TAP controller state machine
TCK1 test clock phase 1
TCK2 test clock phase 2
SDINBS data out of ARM core into scan chain
SDOUTBS data output from scan chain
  (will need mux'ing with other scan chains)

Note that the ARM7TDMI is a latch-based design, and it is assumed that any additional scan cells will be latch-based, too. Latch-based scan cells are discussed in the JTAG specification, IEEE1149.1 Appendix A.

If a D-Types rather than a latch-based design is used, the designer probably might want to ignore SDINBS and use TDI instead (SDINBS is simply the output of a D-type with TDI at the input, clocked by TCK)

The timing of the boundary scan control signals (when in EXTEST), is:

Note the extra pulse on SHCLK2BS.

All ARM core models show the correct behaviour of the TAP Controller, so it is possible to determine the behaviour in more detail from simulations, if necessary.

Some other inputs and outputs related to JTAG & TAP controller are also provided, but these are not strictly JTAG signals. They are provided to make it easier to re-use the TAP controller to add scan chains and to implement an external boundary scan chain for the ASIC.

Multiplexing of JTAG pins:
With 2 TAP controllers, there are 3 possibilities:

1) Have 2 completely separate JTAG ports (i.e. nTRST, TDI, TCK, TMS, TDO for each one = 10 pins).

2) Have 1 set of pins and a mux pin to decide which JTAG port is accessed at any particular time. This will still allows the ARM JTAG test vectors to work and allows EmbeddedICE Interface or Multi-ICE to be used for debugging. 6 pins are then required.

3) Daisy-chain the 2 JTAG ports together. This requires only 5 pins, but does mean only Multi-ICE can be used for debugging, and not the EmbeddedICE Interface. It also will require changes to be made to the ARM7TDMI serial vectors (if used). See Application Note 72, 'Multi-ICE System Design Considerations', in the ARM Application Notes.

Additional reading:

ARM7TDMI Data Sheet (ARM DDI 0029E)
ARM9TDMI Data Sheet (ARM DDI-0145A)
App. Note 28: The ARM7TDMI debug architecture


3. Initialisation and Operation of the ARM core:

What might an initial configuration of the ARM7TDMI look like?

Example initial configuration of the ARM7TDMI input signals:
Signal State Description Remarks
MCLK Driven Memory Clock Input Must be actively driven. If LOW, processor stops
ABE HIGH Address Bus Enable If HIGH, Address bus enabled. If LOW, Address bus is put in a high impedance state
DBE HIGH Data Bus Enable If HIGH, Data bus enabled. If LOW, Data bus is put into a high impedance state
TBE HIGH Test Bus Enable Enables data bus and address bus. Should be held HIGH under normal conditions. If LOW, Address bus, data bus, LOCK,MAS,nTRANS,nRW and nOPC are forced to high impedance state.
nENIN LOW Not Enable Input May be used with nENOUT to control the data bus during write cycles
DBGEN HIGH Debug Enable IMPORTANT! Must be high to allow the use of debug features and Multi-ICE or EmbeddedICE
DBGRQ LOW Debug Request If high, the ARM7TDMI enters debug state. If LOW program executes at system speed.
BREAKPT LOW Breakpoint If high, the current memory access is breakpointed. Should be LOW for normal execution.
EXTERN(1,0) HIGH / LOW External input Allows breakpoints or watchpoints to be dependent on external conditions. If high, the current memory access is breakpointed. Not used by debugger, by default.
CPA HIGH Coprocessor Absent IMPORTANT! Should always be HIGH unless external coprocessor is present
CPB HIGH Coprocessor busy IMPORTANT! Should always be HIGH unless external coprocessor is present.
ISYNC LOW Synchronous Interrupts IMPORTANT! If LOW, nIRQ and nFIQ interrupts are synchronised internally by the ARM core
BUSEN HIGH / LOW Data Bus configuration If HIGH, the unidirectional data buses are used, and the bi-directional data bus must be left unconnected.
DIN[31:0] HIGH / LOW Data In Tied off to either state if BUSEN is LOW and therefore the bidirectional bus is used. Driven by system if BUSEN is HIGH.
D[31:0] HIGH / LOW Bidirectional Data Bus Used by the system if BUSEN is LOW. IMPORTANT! Has to be left unconnected if BUSEN is HIGH
BIGEND HIGH / LOW Big Endian HIGH if system has Big Endian configuration, LOW if Little Endian configuration is used.
ALE HIGH Address Latch Enable Used to control the transparent latches on the address bus, when reading from byte wide memory. If LOW, address is frozen.
APE HIGH Address Pipeline Enable If HIGH allows the address bus to be pipelined, if LOW, address is put on the address bus in phase 1 of the actual cycle.
BL[3:0] HIGH Byte Latch Control Controls when data and instructions are latched from the external data bus. If HIGH, data are latched on the falling edge of MCLK.
ABORT LOW Memory Abort If HIGH tells the memory system that a requested address is not allowed.
nIRQ HIGH Not Interrupt Request Must be taken LOW to interrupt the processor. Appropriate disable bit must be clear.
nFIQ HIGH Not Fast Interrupt Request Must be taken LOW to interrupt the processor. Appropriate disable bit must be clear.
nWAIT HIGH not Wait nWAIT is ANDed with MCLK and must only change when MCLK is LOW. Must be tied HIGH if not used
nRESET LOW for Reset not Reset Has to be driven LOW to reset the ARM core. The processor will start again from address 0 when nRESET goes HIGH.
nTRST LOW not Test Reset May be tied to LOW if boundary scan interface is not to be used. Has to be connected to a pad if JTAG test methodology is used.
SDOUTBS LOW Boundary Scan Serial Output Data Should be tied to LOW if no external scan chains are supplied.
TDI HIGH Test Data Input Should be held HIGH if TAP controller is not active.
TMS HIGH Test Mode Select Should be held HIGH if TAP controller is not active.
TCK HIGH Test Data Output Should be held HIGH if TAP controller is not active.


Reset after power up

It is good practice to reset a static device immediately on power-up, to remove any undefined conditions within the device which may otherwise combine to cause a DC path and thereby increase current consumption. Most systems are reset by using a simple RC circuit on the reset pin to remove the undefined states within devices whilst clocking the device.

Note that nRESET must be held asserted for a minimum of two MCLK cycles to fully reset the core. It is necessary to reset the EmbeddedICE Logic and the TAP controller as well, regardless of whether debug features are used or not. This is done by taking nTRST LOW for at least Tbsr.

During reset, the signals nMREQ and SEQ show internal cycles. After nRESET has been removed (i.e. taken HIGH), the ARM core does 2 further internal cycles before the first instruction is fetched from the reset vector (from 0x00). It then takes in total 3 MCLK cycles to advance this instruction through the fetch-decode-execute stages of the ARM instruction pipeline before this first instruction is executed, as shown in the diagram below.


How can the ARM banked registers be initialized?

The way to initialize the banked registers for the different modes is to enter the specific mode and then to do the initialisation. At boot-up, these registers are indeterminate and thus should be initialized to some known value before they are used. In particular, the stack pointers for each mode (r13) should be initialized before use.

Note that when using a model to describe the behaviour of the ARM core, the registers are initialized with the value 0xDEADDEAD.

Example code can be found in the ROM subdirectory of the examples provided with the ARM Software Development Toolkit and the ARM Developer Suite.


Is an internal (I) cycle always followed by a sequential (S) cycle?

No, an I cycle will not always be followed by an S cycle. It can also be followed by a further I cycle, or by an N cycle. As pointed out in the ARM7TDMI / ARM9TDMI data sheets, during an I cycle, the ARM does not require any memory access. It will output the current PC value onto the address bus, as this is the most probable address it will require next. This allows the memory interface to see the address a cycle before it is required.

As explained in the data sheet, this can be a benefit on some memory systems where a sequential address can be decoded (or the memory can be accessed) more quickly than a non-sequential address. As the address will remain the same between the I cycle and the S cycle, DRAM access can be started during the I cycle and then be completed in the S cycle, which may save 1 wait state.

However, in the situation where there is an internal cycle which changes the value of the pc (r15), the address output during the I cycle will not be correct. So the cycle after this will be a N cycle, using the new pc address. An instruction like LDR pc,[r0] or LDMFD sp!,{r0-r12,pc} would cause this. This type of instruction is often used to return from subroutines or C function calls, so this is quite a common case. Here, the memory decoder will look at the address during the I cycle and start to decode it. It will then sample nMREQ and SEQ (as it would normally) and see that the next cycle will be a N cycle and so it must ignore the address which was on A[] during the I cycle.

If you are designing a memory interface and are not using AMBA, see Application Note 29, Interfacing a memory system to the ARM7TDMI without using AMBA


4. Corrections to ARM7TDMI data sheet (ARM DDI 0029E):

ARM7TDMI Signal Description (Section 2.1):

Section 2.1 of the ARM7TDMI data sheet (ARM DDI 0029E) explains the signals for the ARM7TDMI. Unfortunately, the data sheet was not fully updated from silicon version rev0 to silicon version rev1, and in a few places the signal descriptions are inaccurate:

Name:        Type:     Description:
---------------------------------------------------------------------------------

COMMRX       04        When HIGH, this signal denotes that the comms channel 
                       receive buffer is FULL. This signal 
                       changes on the rising edge of MCLK.

CPA          IC        A Coprocessor which is capable of performing the operation
                       that the ARM7TDMI is requesting (by asserting nCPI) should 
                       take CPA LOW immediately. If CPA is HIGH at the end of 
                       phase 1 of the cycle in which nCPI is LOW, the ARM7TDMI 
                       will abort the coprocessor handshake and take the undefined 
                       instruction trap, if CPB is HIGH as well.
                       If no coprocessor is connected to the ARM7TDMI, both CPA 
                       and CPB have to be tied HIGH.

DBE          IC        This is an input signal which, when driven LOW, puts 
                       the data bus D[31:0] into the high impedance state. 
                       It can be used for test or in shared bus systems. It 
                       should be held high to allow the ARM to output data.

DBGEN        IC        This input signal allows the debug features of the 
                       ARM7TDMI to be disabled. The signal must be high to 
                       allow the EmbeddedICE Logic to be used. It should be  
                       driven low only when debugging will not be required. 

SHCLK2BS     04        ...
                       SLCLK2BS is used to clock the slave half of the external 
                       scan cells.     
                       ...


Software interrupt (Section 3.9.7):

.... A SWI handler should return by executing the following instruction, irrespective of the state (ARM or Thumb):
 MOVS  PC, LR   ; LR is R14_svc. 
This restores the CPSR and returns to the instruction following the SWI.


Reset (Section 3.11):

When the nRESET signal goes LOW, ARM7TDMI abandons the executing instruction and then continues to fetch dummy instructions from incrementing word addresses with nMREQ=1 and SEQ=0 indicating internal cycles.


Little Endian offset addressing (Section 4.9.3, Table 4.15):

The second diagram in the table should read as follows:


The bidirectional data bus (Section 6.10.2, Table 6-3):

In Section 6.10.2 of the ARM7TDMI data sheet Table 6-3 describes which signals can be tristated using ABE, DBE or TBE. In the second row of this table, one tick mark is missing: TBE tristates both A[31:0] and D[31:0].

ARM7TDMI output ABE DBE TBE
A[31:0 Yes - Yes
D[31:0] - Yes Yes


ARM7TDMI Testchip data bus circuit (Section 6.10.3, Figure 6-16):

In Section 6.10.3 of the ARM7TDMI data sheet an example is suggested to connect the ARM7TDMI to an external bus system.

To reduce data out time, a new bus turnaround circuit is suggested. The suggested circuit is available as pdf file: Improved bus turnaround circuit

Data are sampled into the ARM core on the falling edge of MCLK. nENOUT is an output signal from the ARM core, indicating a write to memory: During a data write cycle, nENOUT changes to LOW in phase 1 and stays LOW throughout phase 2 of the current clock cycle.

During a data write nENOUT is ORed with an external Data Bus Enable (optional) to generate nEN2, an active LOW enable to the output driver. At the same time, the input driver is disabled by using nENOUT ANDed with MCLK. This ensures that nEN1 is HIGH at all times during a data write, thus disabling the input driver.

During a data read, nENOUT and therefore nEN2 is HIGH, thus disabling the output driver. During phase 2, both nENOUT and MCLK are HIGH, thus enabling the data bus input driver.

Phase1 (MCLK=0) is used as a bus turnaround phase.

nENIN can now simply be tied low, to indicate to the ARM that it can drive data out as fast as possible. The requirement is then that the ARM doesn't drive the bus too fast, since this might turn ON the ARM bus drivers before nEN1 turns OFF (read followed by a write). For a write followed by a read there is no problem at all, since nEN2 will turn OFF at the start of the read cycle, but nEN1 will not turn ON until the end of the phase, when MCLK rises.

The data out time is now from MCLK falling to internal nENOUT. The nENIN to 'Data bus driven' time has been removed. This gains around 8ns on the data out time.

Care must be taken during JTAG operation.
When performing an EXTEST-CAPTURE instruction, MCLK must be held high.


System state determination (Section 8.10 / 8.11):

Section 8.11.2 of the ARMTDMI data sheet (ARM DDI 0029E) explains how to determine the system status from debug state.

Unfortunately, the data sheet was not fully updated from Silicon Revision 0 to Silicon Revision 1, and in a few places, it should read RESTART instead of BYPASS.

For the Revision 0, the use of BYPASS was correct, but for Revision 1, the RESTART instruction was introduced, and must be used instead.

In the following sections, RESTART should be written instead of BYPASS:

Section 8.10.1 Clock switch during debug
At the bottom of this page, the last sentence should read:

"At this point, RESTART must be clocked into the TAP instruction register."

Section 8.10.2 Clock switch during test
The last paragraph should read:

"On exit from test, RESTART must be selected as the TAP controller instruction. When this is done, MCLK can be allowed to resume. After INTEST testing, care should be taken to ensure that the core is in a sensible state before switching back. The safest way to do this is to either select RESTART and then cause a system reset or to insert MOV PC,#0 into the instruction pipeline before switching back."

Section 8.11.2 Determining system status
1st sentence of 3rd paragraph should read:

"After the system speed instruction has been scanned into the data bus and clocked into the pipeline, the RESTART instruction must be loaded into the TAP controller."

The instruction RESTART is described in section 8.8.10 of the ARM7TDMI Data Sheet (ARM DDI 0029E)

Important for memory access at system speed:

The BREAKPT bit is set in the instruction before the one that is to be executed at system speed. After a load or store instruction at system speed has been executed, debug state is re-entered.

Leaving the debug state involves restoring the ARM7TDMI internal state, causing a branch to the next instruction to be executed, and synchronising back to MCLK. This means that the branch instruction will cause the pipeline to be flushed, and debug state will not be re-entered. Instead, the program will return to the address that was active at the time the core went into debug status, continuing with the execution of the program.


5. Interrupt behaviour:

What happens if an interrupt occurs as it is being disabled?

Description:

If an interrupt occurs at the same time as the interrupt is disabled by the program, the ARM7 family may not behave as expected. For example, during the execution of a sequence such as
        MRS     r0, cpsr
        ORR     r0, r0, #I_Bit   ;disable interrupts
        MSR     cpsr_c, r0       
and an interrupt comes in during execution of the MSR instruction, then the behaviour will be as follows:

  • The interrupt is latched
  • The MSR cpsr, r0 executes to completion setting the I bit in the CPSR
  • The interrupt is taken because the core was committed to taking the interrupt exception before the I bit was set in the CPSR.
  • The CPSR (with the I bit set) is moved to the SPSR_IRQ

This means that, on entry to the interrupt service routine, one can see the unusual effect that an interrupt has just been taken, while the I bit in the SPSR is set. This is known behaviour and generally does not cause a problem since an interrupt arriving just one cycle earlier would be expected to be taken. When the interrupt routine returns with an instruction like:
        SUBS    pc, lr, #4
the SPSR_IRQ is restored to the CPSR. The CPSR will now have the I bit set and therefore execution will continue with interrupts disabled.

However, in a number of cases this can cause problems for particular RTOS vendors in the following case:

The RTOS has a single piece of dispatch code which is called by the interrupt routine and also by some regular code which has interrupts disabled. On exit from the dispatch code the code examines the I bit of the SPSR to determine whether it should perform an interrupt return or a regular return. In this case the dispatch code may become confused and think it has been called from some regular code, and will perform an incorrect return.

Note: The same applies to FIQ interrupts!

Workaround:

The recommended workaround if your dispatch code does examine the disable bits in the SPSR is to add code similar to the following at the start of the interrupt routine.
        SUB     lr, lr, #4              ; Adjust LR to point to return
        STMFD   sp!, {..., lr}          ; Get some free regs
        MRS     lr, SPSR                ; See if we got an interrupt while
        TST     lr, #I_Bit              ; interrupts were disabled. 
        LDMNEFD sp!, {..., pc}^         ; If so, just return immediately.
                                        ; The interrupt will remain
                                        ; pending since we haven't
                                        ; acknowledged it and will be
                                        ; reissued when interrupts are next
                                        ; enabled.
        ...                             ; Rest of interrupt routine
If interrupt latency is critical, the test of the SPSR and return without acknowledging the interrupt should occur before the shared dispatch code is entered.


What happens if an interrupt occurs as it is being enabled?

Interrupts are enabled by clearing the I (for IRQ) or F (for FIQ) flags in the CPSR with an MSR instruction. If an interrupt occurs as it is being enabled, the instruction following the MSR instruction will still be executed.

The reason is that the new flags are only available to the control logic at the end of the execution stage of the MSR instruction. The next instruction will have already been decoded and enters the execution stage of the instruction pipeline just as the flags are being changed.


What are the timing requirements of interrupts entering the ARM core?

Interrupts can be synchronous or asynchronous, depending on the 'ISYNC' pin on the core. If interrupts are asynchronous, they will be synchronised using the main processor clock (ECLK) before the interrupt is recognised.

When an interrupt is recognised by the ARM core, the core will finish executing the instruction which is currently in the execution stage of the ARM instruction pipeline before starting the interrupt sequence.

ARM has defined a standard programmers model of a interrupt controller, as part of our Reference Peripherals Specification. However, your hardware may not necessarily implement this.


Are the IRQ & FIQ interrupts level-sensitive?

Yes. The nIRQ and nFIQ inputs are active low, and level sensitive. They should be driven low and kept low until the interrupt service routine (interrupt handler) acknowledges the exception, then the interrupt request pin should be taken high again.

The normal way this works is that the system will have some interrupt controller external to the ARM7TDMI, which takes the interrupt sources and drives the nIRQ pin, (or nFIQ). The interrupt service routine would then read a memory mapped register in the interrupt controller hardware, to find out which interrupt source was active. It would then write to the interrupt controller register to clear the interrupt (causing the nIRQ pin to be de-asserted) and in the case of a re-entrant interrupt handler, clear the CPSR 'I' bit.


What happens inside the ARM core when an exception occurs?

When an exception occurs, the following happens inside the core:

1) The CPSR is copied to the SPSR of the mode being entered.
2) The CPSR bits are set as appropriate to the mode being entered, the core is set to ARM state, and the relevant interrupt disable flags are set*.
3) The appropriate set of banked registers are banked in.
4) The return address is stored to the link register (of the relevant mode)
5) The PC is set to the relevant vector address.

* There are two interrupt disable bits, one for FIQ, one for IRQ. When ANY exception occurs, the IRQ bit is set, to disable IRQ. If the exception was FIQ or Reset, then the FIQ disable bit is also set.

The IRQ (or FIQ) handler should clear the source of the interrupt before re-enabling further IRQs. One must be very careful when re-enabling interrupts in your handler that you have taken the appropriate steps to allow for re-entrant IRQs (and FIQ). Chapter 9 of the SDT 2.50 User Guide provides a detailed discussion on how to write exception handlers.

See also the following entries elsewhere on this FAQ:
Writing interrupt handlers.
armcc: '__irq' should be used with care.
What happens if an interrupt occurs as it is being disabled?.


What happens if an interrupt occurs and the interrupt handler does not remove the interrupt?

Upon entry to the IRQ exception handler, the 'I' bit is set and further interrupts cannot be recognised by the core until the handler explicitly re-enables further interrupts by writing to the CPSR. As outlined in a previous entry: Are the IRQ (and FIQ) interrupts level-sensitive?, the IRQ handler should not do this until it has acknowledged the interrupt to whatever is driving the nIRQ input.

ARM has defined a standard programmer's model of a interrupt controller, as part of our Reference Peripherals Specification, however, your hardware may not necessarily implement this.

There is also some detailed information and example code in the SDT2.50 User Guide, section 9.5.


Is there a priority scheme for exceptions?

When multiple exceptions are valid at the same time (i.e. more than one exception occurs during execution of an instruction), they are handled by the core (after completing execution of the current instruction) according to the following priority scheme.

Reset
Data Abort
FIQ
IRQ
Prefetch Abort
Undefined Instruction, SWI

The Undefined Instruction and SWI are both caused by an instruction entering the execution stage of the ARM instruction pipeline, so are mutually exclusive and cannot occur at the same time. Thus they have the same priority.

Please note the difference between prioritization of exceptions (when multiple exceptions are valid at the same time), and the actual exception handler code. Exception handlers are themselves liable to interruption by exceptions, and so you must be careful that your exception handlers do not cause further exceptions. If they do, then you must take steps to avoid infinite "exception loops" whereby the link register gets corrupted and points to the entry point of the exception handler, thus giving you no way back to your application code.

The following describes each exception individually.

1) Reset
This is the highest priority interrupt and shall be taken whenever it is signalled. The reset handler should initialise the system, and so there is no need to worry about state preservation etc. When reset is entered, IRQ and FIQ are disabled, and should not be enabled until all interrupt sources have been initialised to avoid spurious interrupts.

Reset is handled in Supervisor (SVC) mode. Note that one of the very first things that a reset handler should to is to set up the stacks of all the other modes, in case of an exception occuring. Note that an exception is not likely to occur in the first few instructions of the reset handler, and indeed no code should be here to provoke such an event, it would be uncommon to have a SWI or an Undefined instruction, nor a memory access, it is reasonable to assume that your reset handler has been hand crafted to map on to your system exactly so as to avoid any exceptions taking place during the handling of reset.

2) Data Abort
The Data Abort has a higher priority than FIQ so that the exception can be flagged and dealt with after the FIQ has been handled (if applicable). Data Abort shall disable IRQ (but not FIQ), and so the handler can be interrupted by an FIQ, but can only be interrupted by an IRQ after they have been specifically re-enabled.
Again, it is unlikely that a SWI or an Undef instruction shall be executed as part of your handler (though it is possible, and the ARM shall enter the relevant mode and deal with that exception, before returning to the abort handler). If you have a prefetch abort, caused by a read error in your abort handler (e.g. the handler was placed in an area of memory that is not currently paged in by the memory controller), then the abort handler will be re-entered. Thus your abort handler should not cause further aborts.

3) FIQ
With the exception of Reset, this is the highest priority interrupt in terms of being handled. The FIQ exception shall disable all IRQs and FIQs and the handler should be hand crafted to execute as quickly as possible. The same arguments as above apply to Aborts, SWIs etc interrupting the handler.

Similarly, when an FIQ is detected, the ARM core automatically disables further FIQs and IRQs (the F and I bits in the CPSR are set for the duration of the FIQ handler). This means that an FIQ handler will *not* be interrupted by another FIQ or an IRQ, unless you specifically re-enable FIQ or IRQ.

For IRQ and FIQ, the default behaviour of the ARM core is to avoid nested (reentrant) interrupts.

4) IRQ
When an IRQ occurs, it shall be dealt with provided an FIQ or data abort has not been raised at the same time. IRQs are disabled (and should only be re-enabled after this current source has been cleared*), and are dealt with in the usual manner. As above, the handler code execution is prone to exceptions as per any other code.

*Please note that you must be very careful when re-enabling IRQs inside your IRQ handler. See section 9.5.2 of the SDT 2.50 Reference Guide for information.

When an IRQ is detected, the ARM core automatically disables further IRQs (the I bit in the CPSR is set for the duration of the IRQ handler). This means that an IRQ handler will *not* be interrupted by another IRQ, unless you specifically re-enable IRQ.

5) Prefetch Abort
If the instruction being executed was read in error, then it is flagged as causing a Prefetch Abort, but this exception is only taken if the instruction reaches the execution stage of the pipeline, and if none of the above exceptions have gone off at this point. IRQs shall be disabled, but other exception sources are enabled, and can be taken during the exception handler if necessary.

6) SWI
If the instruction has been fetched (and decoded) successfully, and none of the other exceptions have been flagged, and this instruction and is a SWI instruction, then the ARM shall enter SVC mode, and go into the SWI handler code. If the SWI calls another SWI, then the LR must be stacked away before the "child" SWI is branched to. This can be done in C code in SDT 2.50 by compiling with the -fz option. See section 9.4.2 of the SDT 2.50 Reference Guide for information. In ADS, -fz is the default behaviour.

7) Undefined Instruction
If the instruction has been fetched (and decoded) successfully, and none of the other exceptions have been flagged, and this instruction and is an undefined instruction, then the ARM shall enter Undef mode, and go into the undefined instruction handler code, which shall generally either offer the instruction to any co-processors in the system, or flag an error in the system if none are present. SWI and Undefined Instruction have the same level of priority, as they cannot occur at the same time, the instruction being executed cannot be both a SWI and an Undefined instruction.


6. Interfacing:

Description of the Coprocessor interface of the ARM7TDMI

The following text briefly describes the coprocessor interface of the ARM7TDMI, and how a coprocessor should work.

Note: If no coprocessor is connected to the ARM7TDMI, both CPA and CPB have to be tied HIGH.

The coprocessor has to follow the pipeline of the ARM7TDMI. So it must have 3 stages (fetch, decode & execute), each holding one ARM instruction. The pipeline will advance each time the ARM does an instruction fetch, so the coprocessor pipeline stage will be controlled by (ECLK and NOT(nOPC)). At the decode stage of the pipeline, the coprocessor should examine the instruction opcode it has fetched. If it is a coprocessor instruction that it recognises, it must look to see if the nCPI ARM output goes low in the execution stage - if so, then the coprocessor instruction should be executed.

If the coprocessor just follows D[31:0], sees a relevant coprocessor instruction and then just waits for nCPI, there may be problems. For example, if the next instruction executed is an LDM of all 16 registers, it would be necessary to wait 20 clock cycles before nCPI goes low. One could just let the coprocessor wait for 20 clock cycles, but this could cause problems. If, for example, a branch occurs before this coprocessor instruction was executed and the program runs an instruction for a different coprocessor, both coprocessors may try to execute it simultaneously. It is also necessary to consider the effect of interrupts/aborts occurring just after the coprocessor instruction appears on the D[31:0] bus.

So, upon recognising a relevant instruction, one needs to count pulses of (ECLK and NOT (nOPC)) to count instruction pipeline advances. Only if nCPI goes low 2 pipeline advances after the coprocessor instruction was fetched should this instruction be executed.

Besides looking at nOPC, it may be useful to consider TBIT. Coprocessor instructions are not possible in Thumb state, so in order to save power the pipeline follower in the coprocessor could be switched off during fetching of Thumb instructions.

Once the coprocessor has recognised an instruction, it must drive CPA & CPB. When the ARM has a coprocessor instruction in its execution stage, it looks for CPA to go low. (If CPA is high, the undefined instruction trap is taken). If CPA is low, CPB is also checked. If CPB is high, the ARM will busy-wait until the coprocessor is ready to execute the instruction. During the busy-wait stage, the ARM will take an IRQ or FIQ if one occurs and the coprocessor instruction will be abandoned. This will be signaled to the coprocessor by nCPI going high. If CPB is low, the ARM will continue to fetch/execute subsequent instructions.

Other things to consider are:

  • Other coprocessors (e.g. internal coprocessors like CP0 for EmbeddedICE, CP14 for the Debug Comms Channel, and CP15 for the MMU/PU).
  • If more than one coprocessor is used, CPA and CPB from all coprocessors can be ANDed together. If required, open drain schemes can be used with pull up resistors. These may be useful if coprocessors from other ASICs are added on board level.
  • Reset. It has to be ensured that nRESET asserted takes CPA & CPB high.

Timing of the coprocessor signals:

The timing of nOPC is dependent upon how one controls APE/ALE (i.e. same timing as A[31:0] bus). If APE=ALE=1, then nOPC will change during the clock high phase of the cycle before the actual data transfer takes place. D[31:0] is valid on the falling edge of MCLK.

nCPI changes off the falling edge of MCLK - the old value stays valid for time Tcpih and the propagation delay for the new value is given by the timing parameter Tcpi. The MCLK input to ECLK output propagation delay (Tcdel) also has to be taken into account.

CPA & CPB are sampled on the MCLK rising edge. Of course, it may be possible to generate these signals during the previous MCLK high phase if the pipeline is being followed. They will be sampled on every MCLK rising edge - the setup and hold times (Tcps and Tcph) have to be met.

The above picture shows an ARM7TDMI executing a coprocessor MCR instruction:
Cycle 1:  Fetch the instruction MOV R2,#2 (opcode E3A02002)
Cycle 2:  Fetch the instruction MCR cp4,0,r2,c1,c0 (opcode EE012410). Decode the MOV.
Cycle 3:  Execute the MOV, Decode the MCR and Fetch the next instruction.
Cycle 4:  nCPI goes low off the falling edge of MCLK. Notice that the coprocessor here has driven CPA & CPB without waiting for nCPI low. This is not mandatory. The propagation delay between the MCLK falling edge and nCPI valid is given by Tcpi. Also note that nMREQ & SEQ are now indicating that a coprocessor data transfer cycle will follow. nOPC goes high during the clock high phase (but this timing can be changed if APE or ALE are used to modify the A[31:0] timing). The propagation delay from MCLK rising to nOPC valid is given by Topcd.
Cycle 5:  The ARM now writes the value to be transferred to the coprocessor onto the D[31:0] bus. CPA & CPB have been driven high by the coprocessor to indicate the transfer has completed.


Memory mapping hardware registers on word boundaries

Description

It is strongly recommended that ARM based designs align memory mapped hardware registers on word (32-bit) boundaries, rather than sub-word boundaries. The main reason for this is to make the hardware interface easier to implement.

Solution

When reading 8 or 16 bit values from a peripheral the data must be presented on the correct byte lane. For example when the ARM reads an 8-bit register at address 0x2 it expects the data to be presented on data lines D[23:16]. This means that hardware is needed to route the data from the registers appropriately. If the ARM is in big endian mode than this will be different.

If all memory-mapped registers are on word boundaries then the data can be presented on D[7:0] or D[15:0] and no byte lane steering hardware is required. Writes do not matter so much because the ARM will write a byte on all byte lanes or a half-word on both halves to make interfacing easier.

In general, the hardware must ensure that 8, 16 and 32 bit accesses to memory and registers work correctly and that data is presented on the correct byte lane associated with the size and address of the transfer.

See Application Note 61, "Big and Little Endian Byte Addressing" in the ARM Application Notes.


7. Synthesisable ARM cores:

Differences between the ARM7TDMI-S and the ARM7TDMI

Documentation on the differences between the ARM7TDMI-S and the ARM7TDMI can be found in the ARM7TDMI-S Technical Reference Manual in Appendix B.


8. Cached ARM cores:

Which cached cores are available and what do they include?

The following shows the naming conventions for the cached ARM cores, and a list of cores that are currently available.

  • ARMx10T includes ARM7TDMI, cache and MMU
  • ARMx20T includes ARM7TDMI, ARM9TDMI or ARM10TDMI, cache, MMU and WinCE support
  • ARMx40T includes ARM7TDMI or ARM9TDMI, cache and protection unit
homeup